Treebank-Based Probabilistic Phrase Structure Parsing
نویسنده
چکیده
The area of probabilistic phrase structure parsing has been a central and active field in computational linguistics. Stochastic methods in natural language processing, in general, have become very popular as more and more resources become available. One of the main advantages of probabilistic parsing is in disambiguation: it is useful for a parsing system to return a ranked list of potential syntactic analyses for a string. In this article, we introduce probabilistic context-free grammars (PCFGs) and outline some of their strengths and weaknesses. We concentrate on the automatic extraction of stochastic grammars from treebanks (large collections of hand-corrected syntactic structures). We describe the current state of the field and the current research on improving the basic PCFG model. This includes lexicalized, history-based and generative models. Finally, we briefly mention some research into probabilistic phrase structure parsing for domains other than traditional treebank text and languages other than English (Chinese, Arabic, German and French).
منابع مشابه
Feature Engineering in Persian Dependency Parser
Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...
متن کاملExploring HPSG-based Treebanks for Probabilistic Parsing HPSG grammar extraction
We describe a method for the automatic extraction of a Stochastic Lexicalized Tree Insertion Grammar from a linguistically rich HPSG Treebank. The extraction method is strongly guided by HPSG–based head and argument decomposition rules. The tree anchors correspond to lexical labels encoding fine–grained information. The approach has been tested with a German corpus achieving a labeled recall of...
متن کاملExploring HPSG-based Treebanks for Probabilistic Parsing
We describe a method for the automatic extraction of a Stochastic Lexicalized Tree Insertion Grammar from a linguistically rich HPSG Treebank. The extraction method is strongly guided by HPSG–based head and argument decomposition rules. The tree anchors correspond to lexical labels encoding fine–grained information. The approach has been tested with a German corpus achieving a labeled recall of...
متن کاملEfficacy of Beam Thresholding, Unification Filtering and Hybrid Parsing in Probabilistic HPSG Parsing
We investigated the performance efficacy of beam search parsing and deep parsing techniques in probabilistic HPSG parsing using the Penn treebank. We first tested the beam thresholding and iterative parsing developed for PCFG parsing with an HPSG. Next, we tested three techniques originally developed for deep parsing: quick check, large constituent inhibition, and hybrid parsing with a CFG chun...
متن کاملAdapting a Probabilistic Disambiguation Model of an HPSG Parser to a New Domain
This paper describes a method of adapting a domain-independent HPSG parser to a biomedical domain. Without modifying the grammar and the probabilistic model of the original HPSG parser, we develop a log-linear model with additional features on a treebank of the biomedical domain. Since the treebank of the target domain is limited, we need to exploit an original disambiguation model that was tra...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Language and Linguistics Compass
دوره 2 شماره
صفحات -
تاریخ انتشار 2008